# Chinese Multimodal
Chinese Clip Vit Base Patch16
Chinese CLIP model based on ViT architecture, supporting multimodal understanding of images and text
Text-to-Image
Transformers

C
Xenova
264
1
Mengzi Oscar Base Caption
Apache-2.0
A Chinese multimodal image captioning model fine-tuned on the AIC-ICC Chinese image caption dataset, based on the Mengzi-Oscar pretrained model
Image-to-Text
Transformers Chinese

M
Langboat
23
2
Featured Recommended AI Models